This document provides the RMarkdown behind the figures and interactive visualisations in our paper for anyone who wants to see how they were created or who wishes to extend them.
We begin by generating all possible confusion matrices of a given size. A confusion matrix comprises four non-negative integers and the sum of these is known as a weak composition. Here is a function to create all k element weak compositions of total n, based on code by Michel Billaud
makeAllWeakCompositions <- function(n,k){
# Initialise the matrix that will hold all compositions
composition <- matrix(data=0, nrow=choose(n+k-1,k-1), ncol=k)
composition[1,k] <- n # Set the first composition (0,...,0,n)
current.row <- 1 # Set the current row to the first row
last.nonzero <- k # The last non-zero element of the current row is in position k
# While the first element of the current row is less than n...
while(composition[current.row,1] < n){
# generate the next row
next.row <- current.row + 1
# copy the current row into the next row
composition[next.row,] <- composition[current.row,]
# turn a b ... y z 0 0 ... 0
# ^ last
# into a b ... (y+1) 0 0 0 ... (z-1)
last.nonzero <- max(which(composition[next.row,] > 0))
z <- composition[current.row, last.nonzero]
composition[next.row, last.nonzero - 1] <- composition[current.row, last.nonzero - 1] + 1
composition[next.row, last.nonzero ] <- 0
composition[next.row, k ] <- z - 1
current.row <- next.row
}
return(composition)
}
To demonstrate, here are the first six and the last six weak compositions of four elements that sum to 5
makeAllWeakCompositions(5,4) %>% head()
[,1] [,2] [,3] [,4]
[1,] 0 0 0 5
[2,] 0 0 1 4
[3,] 0 0 2 3
[4,] 0 0 3 2
[5,] 0 0 4 1
[6,] 0 0 5 0
makeAllWeakCompositions(5,4) %>% tail()
[,1] [,2] [,3] [,4]
[51,] 3 1 1 0
[52,] 3 2 0 0
[53,] 4 0 0 1
[54,] 4 0 1 0
[55,] 4 1 0 0
[56,] 5 0 0 0
The number \(C_k^{'}(n)\) of compositions of a number \(n\) of length \(k\) (where 0 is allowed) is given by (Weisstein, n.d.):
\[ \begin{align} C_k^{'}(n) &= \binom{n+k-1}{k-1}\\ &=\frac{(n+k-1)!}{n!(k-1)!} \end{align} \]
so \[ \begin{align} C_4^{'}(0) &= 1\\ C_4^{'}(1) &= 4\\ C_4^{'}(2) &= 10\\ C_4^{'}(3) &= 20\\ C_4^{'}(4) &= 35\\ C_4^{'}(5) &= 56\\ &\vdots\\ C_4^{'}(100) &= 1.76851\times 10^{5} \end{align} \]
Now that we can generate all possible confusion matrices of a given total, we can augment them with various performance metrics.
The following function returns a dataframe representing all possible confusion matrices of size n, projected into 3-dimensions with a range of performance metrics added as columns
make.confmat <- function(n) {
# This matrix is used to project the four dimensional confusion matrix into three dimensions
project3d <- matrix(
c(
0 , 0 , 1, # TP
0 , 1 , 0, # FP
1 , 0 , 0, # FN
-1/3, -1/3, -1/3 # TN
), byrow = TRUE, nrow=4
)
makeAllWeakCompositions(n,4) -> abcd # All confusion matrices of size n
colnames(abcd) <- c("TP", "FP", "FN", "TN") # with columns named after confusion matrix elements
abcd %*% project3d -> xyz # ...projected into 3D
colnames(xyz) <- c("x", "y", "z") # with columns named after the three dimensions
bind_cols(as_tibble(abcd), as_tibble(xyz)) %>% # ...bound side by side
mutate( # and augmented with...
text =sprintf("%2d %2d\n%2d %2d", TP,FP,FN,TN), # Label for plotly
Pos =TP+FN, # Number of actual positives
Neg =FP+TN, # Number of actual negatives
TPR =TP/Pos, # True Positive Rate
FPR =FP/Neg, # False Positive Rate
PLR =TPR/FPR, # Positive Likelihood Ratio (LR+)
TNR =TN/Neg, # True Negative Rate
FNR =FN/Pos, # False Negative Rate
NLR =FNR/TNR, # Negative Likelihood Ratio (LR-)
DOR =PLR/NLR, # Diagnostic Odds Ratio
prior.O =Pos/Neg, # Prior odds of actual class being X
prior.P =zdiv(Pos,Neg), # Prior prob of actual class being X
post.O =TP/FP, # Posterior odds that actual class is X
post.P =zdiv(TP,FP), # Posterior probability that actual class is X
prior.O.n =Neg/Pos, # Prior odds of actual class NOT being X
prior.P.n =zdiv(Neg,Pos), # Prior prob of actual class NOT being X
post.O.n =TN/FN, # Posterior odds that actual class is NOT X
post.P.n =zdiv(TN,FN), # Posterior probability that actual class is NOT X
MCC =MCC(TP,FP,FN,TN), # Matthews correlation coefficient
logDOR =log(DOR), # log of DOR
slogDOR =logDOR/log((Pos-1)*(Neg-1)), # scaled log of DOR
J =TPR + TNR - 1, # Youden's J, Balanced Accuracy
Acc =(TP+TN)/(Pos + Neg), # Accuracy
F1 =2*TP / (2*TP + FP + FN), # F1
Markedness =post.P + post.P.n - 1, # Markedness
g.mean =sqrt(TPR * TNR), # Geometric mean
Prev.Thresh=sqrt(FPR)/(sqrt(TPR)+sqrt(FPR)), # Prevalence threshold
Threat.Scr =TP / (TP + FN + FP), # Threat score
Fowlkes.M =sqrt(post.P * TPR), # Fowlkes-Mallows index
)
}
The helper function plot.simplex() generates an interactive 3D visualisation of a confusion matrix, coloured by a chosen metric. Here it is used to provide 3D projections of binary confusion matrices of size 100. Each point corresponds to a unique confusion matrix and is coloured by the value of that matrix’s Matthews Correlation Coefficient (MCC). For reference, we label the four extreme points corresponding to all True Positives, (TP=100), all False Negatives (FN=100), etc., and connect those vertices to give an impression of the regular tetrahedral lattice (i.e., the 3-simplex) of the projected points. In total, there are \(\binom{100+4-1}{4-1}=176\,851\) different binary confusion matrices of size 100. Rather than show all these, we have taken three slices through the lattice: from back to front, the rectangular lattices of points correspond to confusion matrices where \(p = 20, 50, 90\), respectively.
plot.simplex(confmat.100, metric="MCC")
Mouse over the tetrahedron, then click and drag to change its orientation. Click on the text Pos==20 to toggle that slice of the confusion matrix.
Here is the same confusion simplex, this time, coloured by Accuracy.
plot.simplex(confmat.100, metric="Acc")
Here is a confusion matrix representing \(N=a+b+c+d\) examples \[ \begin{bmatrix} \mathrm{TP} & \mathrm{FP}\\ \mathrm{FN} & \mathrm{TN} \end{bmatrix}= \begin{bmatrix} a & b\\ c & d \end{bmatrix} \] in which there are \(p=a+c\) actual positives and \(n=b+d\) actual negatives.
The Matthews Correlation Coefficient is defined to be \[ \begin{align} \mathrm{MCC}(a, b, c, d) &=\frac{ad-bc}{\sqrt{(a+b)(a+c)(b+d)(c+d)}}\\ &=\frac{ad-(n-d)(p-a)}{\sqrt{(a+n-d)pn(p-a+d)}}. \end{align} \]
For given numbers of positives (\(p\)) and negatives (\(n\)), this performance metric achieves a value of \(-1 \leq k \leq 1\) along the contour lines with \[ a(k, p, n, d) = \left\{ \begin{array}{ c l } \frac{1}{2 (k^2 p + n)} \left( +\sqrt{ \frac {k^2 p (n + p)^2 (4d(n-d) + k^2 n p)} {n} } + 2dp(k^2 - 1) + k^2p(p - n) + 2np \right), & k \geq 0\\ \frac{1}{2 (k^2 p + n)} \left( -\sqrt{ \frac {k^2 p (n + p)^2 (4d(n-d) + k^2 n p)} {n} } + 2dp(k^2 - 1) + k^2p(p - n) + 2np \right), & k < 0 \end{array} \right. \]
To illustrate these contours in ROC space, here is an orthographic projection of the slice of points from the confusion simplex shown above where \(p=20\) and \(n=80\), coloured by the value of the Matthews Correlation Coefficient (MCC). The continuous lines indicate the contours of MCC, ranging from \(-0.9, -0.8, \dots, 0.9\). Note that while MCC can be calculated for continuous arguments, empirical confusion matrices give rise to a finite set of \((p+1)\times(n+1)\) arguments, corresponding to the points in this 2D lattice.
…and here are the same points and contours scaled to fit within the ROC space. ROC curves plot a classifier’s true positive rate against its false positive rate in the space of rational numbers from \([0,1]\times[0,1]\). This is equivalent to re-scaling the \(x\)-axis of (a) by a factor of \(\tfrac1n\) and the \(y\)-axis by \(\tfrac1p\). Again, the contours of the MCC performance metric are defined continuously, but empirical confusion matrices can only take on values at the discrete points in this plot which have \(n+1=81\) possible \(x\)-values \((0, \tfrac1n, \tfrac2n, \dots, 1)\) and \(p+1=21\) possible \(y\)-values \((0, \tfrac1p, \tfrac2p, \dots, 1)\).
Each panel shows all the possible points in the ROC space of confusion matrices of 20 positive and 40 negative examples (top row) and 20 positive and 41 negative examples (bottom row). Points are coloured by the number of times the performance metric value at that point is observed in the confusion matrices of those totals. Three different performance metrics are presented: MCC (left), BA (middle), \(F_1\) (right). Performance metric contours are shown in the background, coloured by their value. Note that one additional negative example changes the configuration of possible points in ROC space so that each possible MCC and BA value is unique (bottom left and middle); the multiplicity of different \(F_1\) values remains much the same. (bottom right).
Here are two ways to show confusion matrix pmfs and performance metric contours in ROC space. Both plots show the posterior predictive pmf of confusion matrices under a beta-binomial model of uncertainty for a classifier observed to produce the confusion matrix \[\begin{bmatrix}16&8\\4&32\end{bmatrix}\] The left plot uses circle areas to represent probability mass; the right plot uses ridge lines. In the background are the contours of the \(F_1\) performance metric and in black are the contours \(F_1=\tfrac{4}{10}\) and \(F_1=\tfrac{2}{3}\), along each of which lie 11 points in ROC space.
Reducing uncertainty in performance metrics requires more data to increase the precision of the predictive distribution of confusion matrices. These four contour plots show the posterior predictive pmfs (under a beta-binomial model of uncertainty) after observing confusion matrices of increasing size but with the same false and true positive rates (0.2, 0.8). From left to right, the sizes of confusion matrix increase by a factor of 4 and the heights and widths of the contours decrease by a factor of \(\tfrac{1}{2}\).